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ABSTRACT 



A Rasch measurement model can be constructed to mee*- 



the requirements of rank ordered data. If multiple rankings of tht, 
same objects are available, then the parameters of the objects can be 
estimated, along with their standard errors and also with statistics 
summarizing the fit of the data to the measurement model. This paper 
summarizes the relevant theoretical principles associated with rank 
ordering and presents an example of this sort of analysis. The 
example includes H. Polskin f s (1988) rankings of seven play-by-play 
baseball announcers on six specific items of performance. The 
application of the principles of fundamental measurement to rank 
ordered data provides the means to convert entirely local rankings 
into generalizable measures of the latent abilities. Moreover, fit 
statistics for each object and for each ordering enable a 
determination of the success of the ranking process as a measurement 
operation. (TJH) 
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Abstract: 

A Rasch measurement model can be constructed to meet the requirements of rank 
ordered data. If multiple rankings of the same objects are available, then 
the parameters of the objects can be estimated, along with their standard 
errors aud also statistics summarising the fit of the data to the measurement 
model. An example analysis is provided. 

Key-words: Rank order, Rasch measurement. 



I. Introduction. 

"Examiners who are asked to place answer books in rank order, or order of 
merit, are asked to do a task which is far simpler for human judgement than 
is the assigning of absolute marks" (Harper 1976 p. 255). The use of 
judge-created rankings of the performance of examinees on each test item, 
instead of judge-awarded scores, removes from the test analysis the severity 
of the judges, the difficulty of the test items and the arbitrary nature and 
idiosyncratic implementation of rating scales. 

Ranking would also appear to remove the foundational component identified by 
Rasch for measurement models in psychology: "The possible behavior of a pupil 
is described by means of a probability that he solves the task" (Rasch 1980 
p. 11). A ranking of examinees contains no indication of what level of 
success the examinees attained on the particular item on which they were 
judged. It does, however, contain information about their relative levels of 
success. 

It has been observed that judges differ considerably in the rankings they 
assign: "The [examinee's performance with] the highest degree of agreement 
still covered nearly one-third of the range of ranks, while the average 
[range of ranking a performance] included nearly two-thirds of the available 
ranks" (Harper 1976 p. 14). It is this variation in the rankings across 
judges which provides the stochastic element necessary for Rasch measurement. 



II. The fundamental measurement model for paired objects. 

A comparison of the performance of two objects (e.g. Om and On) across 
numerous replications of a given agent (e.g. a test item) yields counts of 
the three possible outcomes: 

1) Fan, the frequency with which 0m out-performs On. 

2) Fns the frequency with which On out-performs Om. 

3) the frequency with which they perform at the same level. 

For the purposes of this discussion, the discrimination of performance is 
assumed to be so fine that identical performance levels never occur. Thus, a 
comparison of the performance levels of these objects is Fmn/Fnm, which 
becomes, in the limit, Pmn/Pnm, where Pmn is the probability that Om 
out-performs On and Pnm is similarly defined. 
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We can also define object 00, whose performance level is at the local origin 
of the measurement scale* We can similarly compare the performance of 0m with 
00 yielding PmO/POm, and also On with 00 yielding PnO/POn. 

Following Rasch, "if a relationship between two or more variables is to be 
considered really important, as more than an ad hoc description of a very 
limited set of data - if a more or less general interdependence may be 
considered in force - the relationship should be found in several sets of data 
which differ materially in some relevant respects" (Rasch 1980 p. 9). In 
our case, this implies that the results of a direct comparison of Om and On 
should lead to the same conclusion as a comparison of Om with On via 00. This 
requirement for general inability thus leads to 

Pan PmO PnO 

— = — / — (1) 
Pnm POm POn 

but PmO/POm is the performance of Om relative to a measure defined 

to be at the origin of the scale, and so is a constant, Am. Similarly 

PnO/POn is a constant, An. Then taking logarithms, 

log(Pmn/Pnm) « log(Am) - log(An) (2) 

or, reparaaeterizing, this becomes the measurement model for paired objects, 

log(Pan/Pnm) = Bm - Bn (3) 

where 

Pan is the probability that object 0m out-performs object On 
Pnm is the probability that object On out-perfor&s object 0a 
Bm is the measure of object 0m 
Bn is the measure of object On. 



III. Extending the measurement model from pairs to rankings. 

If a ranking is of only two objects, then the measurement model for paired 
objects applies directly. Thus the probability, Rab of observing object 0a 
ranked higher than object 0b is given by 

Pab 

Rab = Pab = (4) 

Pab + Pba 

where the denominator contains 2! = 2 terms, representing all the 
possible valid numerators for ordering two objects. 

The ranking of three objects, Oa, Ob, Oc, can be regarded as a set of three 
paired rankings, but with the constraint that if Oa is ranked higher than Ob, 
and Ob is ranked higher than 0c, then Oa must be ranked higher than Oc. The 
probabilities of their eight theoretically possible paired relationships are 
shown in Table 1. 
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Probability of Representation 
independent pairing as rank order 



Pab*Pac*Pbc B(0a,0to,0c) 

Pab*Pac*Pcb B(0a,0c,0b) 

Pab*Pca*Pcb B(0c,0a,0b) 

Pba*Pac*Pbc B(0b,0a,0c) 

Pba*Pca*Pbc B(0b,0c,0a) 

Pba*Pca*Pcb B(0c,0b,0a) 

Pab*Pca*Pbc inconsistent 

Pba*Pac*Pcb inconsistent 

Table 1* Probabilities of all possible paired comparisons of of three 
objects. The contents of B( ) represent the ordering of the objects* 

The effect of the constraint on the pairings iaposed by ranking is the 
deteraination that two of the possible paired combinations of objects are 
inconsistent and can never be observed* Apart froa this constraint, the 
probability of observing any particular rank ordering is assumed to depend 
only on the paired comparison of the objects and not to involve any other 
characteristics of the sample of objects* This is equivalent to the "local 
independence" axiom of other Basch models* Thus the comparison of the 
objects manifested in the ranking is as "sample-free" as possible* If, for 
any particular data set of rankings, this is not the case, then the data set 
can not be expected to fit the measurement model presented here* Fit 
statistics can diagnose this eventuality* 

Considering the possible rankings, if Bab is the probability that Oa is 
ranked higher than Ob in the rank ordered data, then 

Bab Probability of observing Oa higher than Ob 

— = (5) 

Bba Probability of observing Os higher than Oa 

*>ab*Pac*Pbc ♦ Pab*Pac*Pcb + Pab*Pca*Pcb 

= (6) 

Pba*Pac*Pbc ♦ Pba*Pca*Pbc + Pba*Pca*Pcb 

Pab ♦ (1 - Pca*Pbc) 

■ (7) 

Pba * (1 - Pac*Pob) 

since Bab si- Bba, then 

Pab * (1 - Pea*Pbc) 
Bab = ( 8) 

Pab * (1 - Pca*Pbc) + Pba * (1 - Pac*Pcb) 

Pab * (1 - Pca*Pbc) 

Bab = (9) 

1 - Pab*Pca*Pbc - Pba*Pac*Pcb 
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Probability of observing Oa ranked higher than Ob 



Bab = (10) 

Probability of all possible rankings 

Bbc and Rac are siailarly obtained. These probabilities are not independent 
as the following identity sakes clear: 

Rabc i Rab n Rbc i Sab n Rbc A fL%c (11) 

where Rabc is the probability of observing the ranking R(0a,0b,0c) 
H is the intersection of the saaple spaces. 

In particular, in general, 

Rabc < Rab * Rbc (12) 

with the precise result that 

Probability of R(0a,0b,0c) 
Babe s (13) 

Probability of all possible rankings 

Pab * Pac * Pbc 

= (14) 

1 - Pab*Pca*Pbc - Pba*Pac*Pcb 



Numbering the objects arbitrarily, 01, 02, 03, then the probability of the 
observed rank ordering, whatever it is, is given by 

3 3 

it * (Xjk * Pjk + Xkj * Pkj) 
j=l k=j+l 

R({3}) = (15) 

ZR({3}>/ 

where 

R({3}) is the probability of a particular ranking of 3 objects 
Xjk ■ 1 if Oj is ranked higher than Ok, 

= 0 otherwise 
Xkj = 1 - Xjk 

ZR({3}) is the sua of all possible numerators and contains one 
tern for every perautation of 3 objects, i.e. 3! = 6 terns. 



IV. Rank ordering of n objects. 

For convenience of generalization, let us arbitrarily nuaber the objects 01, 
02,.., On with corresponding parameters Bl, B2,..,Bn. For soae rank ordering 
of the objects, R({n}), 
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n n 

w * (Xjk * Pjk + Xkj * Pkj) 
j=l k=j+l 

B((n}) = (16) 

ER({n}) 

with the same conventions as before. In particular, ER({n}) is a sua 
including ons term for each of the possible numerators, identical to that 
numerator. The number of possible numerators is the number of ways of 
perauting n objects, that is n!, 

The measurement model defining the relationship between objects Oj and 
Ok is, rewriting (3) with Pkj = 1 - Pjk, 

Pjk = exp(Bj) / (Exp(Bj)+Exp(Bk)) (17) 

and 

Pkj = exp(Bk) / (Exp(Bj)+Exp(Bk)) (18) 

so that the probability of a rank ordering in terms of the underlying 
parameters is 

n n Xjk*exp(Bj) + Xkj*exp(Bk) 
B({n}) = it * 

J=l k=j+l exp(Bj)*exp(Bk) 
(19) 

ZR({n}) 



V. Independent rank order ings of n objects. 

If independent rank orderings of the same n objects have been compiled 
by T judges, then the likelihood of the data set becomes 

n n Xrjk*exp(Bj) + Xrkj*exp(Bk) 

n k — 

T j=l k=j+l exp(Bj )+exp(Bk) 

n{n} = n (20) 

r=l 2B({n}) 

For estisability of all parameters in one frame of reference, it is required 
that the orderings of the objects overlap in such a way that every object can 
be compared to every other object, either directly or indirectly, in terms of 
both relative successes and relative failures. If, for instance, one object 
is always ranked highest, then its parameter is inestimable. A more subtle 
example of inestisability is a set of orderings in which the objects form 
two groups, the high group and the low group, and no object in the high 
group is ever ranked below any object in the low group. 

If all objects do not participate in every rank ordering, the overall 
likelihood becomes the product of the likelihood of homogeneous subgroups in 
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which the same set of objects has been ranked by one or sore judges. Thus 
if n objects have been ranked by T judges, and ■ objects (including some of 
the n objects) have been ranked by S judges > then 



n n Xrjk*exp(Bj) + Xrkj*exp(fik) 

K K 

T j=l k=j+l exp(Bj)+exp(Bk) 

f){mUn} = ( k ) 

r=t 2»({n}) 

m m Xrjk*exp(Bj) + Xrkj*exp(Bk) 
% % — 

S j=l k=j+l exp(Bj)+exp(Bk) 
* ( m } (21) 

r=l 2R({m}) 



where n{mUn} is the likelihood of the entire data set* The following 
derivation can then be adapted to this formulation of the data, but, 
for clarity, we return to the consideration of a homogeneous data set. 

The factor 

n n 1 

K K 

j=l k=j+l exp(Bj)+exp(Bk) 
is common to every term in the numerator and denominator of (20), and 



so can be cancelled out* Thus (20) becomes 
n n 

n * (Xrjk*exp(Bj) + Xrkj*exp(Bk) ) 
T j=l k=j+l 

n{n} = * (22) 

r=l n! n n 

2 k % (Xsjk*exp(Bj) + Xskj*exp(Bk) ) 



s=l j=l k=j+l 

The denominator includes all the possible numerators corresponding to all 
valid rankings and so consists of n! terms corresponding to the n! ways of 
ordering n objects* 

Taking logarithms, th<. log- likelihood of a set of T rank orderings of 
n objects is 

T n n 

log(n{n}) = ¥ = 2 2 2 (log (Xrjk*exp(Bj) ♦ Xrkj*exp(Bk))) 
r=l j=l k=j+l 

n! n n 

- T*log( 2 a m (Xsjk*exp(Bj) ♦ Xskj*exp(Bk))) ) (23) 
s=l j=l k=j+l 
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VI. Estimation equations for rank ordered objects. 

The Newton-Raphson estimation equations for the paraaeters can be obtained 
using first and second derivatives of the log-likelihood function (23). 

To estimate Be, partially differentiate with respect to Ba, 

8¥ T n 

= 22 Xrsj 

6Ba r=l j=l,"a 

n n 

a a (Xrjk*exp(Bj) + Xrkj*exp(Bk)) 
n! n j=l k=j+l 

- T * 2 (2 Xral) (24) 

r=l 1=1 , "n n! n n 

£ a a (Xsjk*exp(Bj) + Xskj*exp(Bk)) 
s=l j=l k=j+l 

The first term represents the observed score and is a count of the nuaber of 
objects higher than which Oa is ranked in all the observed rank orderings. 
The second tern represents the expected score is the sua, across all possible 
rank orderings, of the nusber of objects higher than which Oa is ranked in 
each rank ordering, sultiplied by the probability of that rank ordering, all 
Multiplied by the nuaber of rank orderings in the observed data. 

Differentiating the log-likelihood again with respect to Ba, 

n! n n n 

2 (2 Xral) a x (Xrjk*exp(Bj) + Xrkj*exp(Bk) ) 
« 2 ¥ r=l 1=1, "a j=l k=j+l 

= T *( )2 

oBa' n! a n 

2 x a (Xsjk*exp(Bj) + Xskj*exp(Bk) ) 
s=l j-1 k=j+l 

n! n n n 

2 (2 Xral) 2 a a (Xrjk*exp(Bj) + Xrkj*exp(Bk)) 
r=l 1=1, ~n j=l k=j+l 

- T*( > (25) 

n! n n 

2 x x (Xsjk*exp(Bj) + Xskj*exp(Bk) ) 
s=l j=l k=j+l 

This provides the specific fora of the teras for the general fora of the 
Newton-Baphson estiaation equation for B'a, the iaproved estiaate of Ba, 
which is the Measure corresponding to object Oa, 

o¥ 8 a ¥ 

B'a = Ba - — / (26) 
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When the iterative process has converged) the asymptotic standard error of 
the est i sate, r*, is given by 

3.E.(B») = ) (27) 

CBs 2 

Rasch sodel fit statistics, both information-weighted and outlier-sensitive, 
can also be calculated (Wright and Masters 1982 p. 100). 



VII. Tied rankings. 

In soae judging situations, two or aore objects My be given the same 
ranking. If two objects Oj and Ok are given the same recking, then this is 
equivalent to the statement that order ings (Oj, Ok) and (Ok, Oj) are equally 
probable as representations of the ordering of the objects on the latent 
variable. Consequently, if order ings (Oj, Ok) and (Ok, Oj) are each given a 
weighting of one-half, then the sub is equivalent to the tied ordering. Thus, 
if Oj and Ok are tied in the ordering, then Xjk = 0.5 and Xkj ■ 0.5 for the 
purposes of determining espirical scores. Considered in this way, the 
adaissability of tied rankings does not add any nore order ings into the 
scheae of all possible rank order ings described above. 



VIII. An application of the Rasch sod el for rank ordered objects. 

In Polskin (1988), nd reproduced in Table 2, are rankings of seven 
play-by-play baseball announcers on six specific it ess of performance. For 
the purposes of this analysis, the six rankings are considered to be 
independent nan if est at ions of the saae latent abilities. 



Calling the gaae 

1. Vin Scully 

2. Bob Cost as 

3. Al Michaels 

4. Skip Caray 

5. Harry Caray 

6. Steve Zabriskie 

7. Ralph Kiner 

Working with analyst 

1. Bob Costas 

2. Al Michaels 

3. Vin Scully 

4. Skip Caray 

5. Steve Zabriskie 
S. Ralph Kim r 
7. Harry Caray 

Table 2. Rankings of Play 



Broadcasting ability 

1. Vin Scully 

2. Al Michaels 

3. Bob Costas 

4. Skip Caray 

5. Harry Caray 

6. Steve Zabriskie 

7. Ralph Kiner 

Knowledge of baseball 

1. Vin Scully 

2. Ralph Kiner 

3. Bob Costas 

4. Al Michaels 

5. Harry Caray 

6. Skip Caray 

7. Steve Zabriskie 

by-Play Announcers. 



Quality of anecdotes 

1. Vin Scully 

2. Bob Costas 

3. Al Michaels 

4. Skip Caray 

5. Ralph Kiner 

6. Harry Caray 

7. Steve Zabriskie 

Enthusiaas level 

1. Harry Caray 

2. Al Michaels 

3. Bob Costas 

4. Vin Scully 

5. Steve Zabriskie 

6. Skip Caray 

7. Ralph Kiner 
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The Rasch rank-order measurement model can be used to answer such questions 
as "How such better is one announcer than another ?". H Which announcers have 
the most consistent quality level ?" and "Do the iteas cooperate in defining 
one "Quality of Announcing" variable ?" 

In answer to the question, "How much better is one announcer than another ?", 
Table 3 lists the estimates of the Measures obtained for this data set. The 
relationship between the sub of each announcer's ranks and his measure is 
close to linear, as can be seen by inspection of Figure 1. The most 
consistently ranked announcer, with a mean-square fit statistic of 0.34, is 
Al Michaels, and the one most inconsistently ranked is Ralph Kiner with a fit 
of 2.12. The distinction between "information-weighted" fit statistics and 
"outlier-sensitive" fit statistics does not exist here because the variance 
term for each estimate is uniform across rank orderings. How difficult Misfit 
is to determine, by eye, from lists of rank orderings is indicated by the 
different conclusion reached by Polskin. According to the analysis given in 
the text of his article, Polskin had the impression that Harry Caray was the 
least consistently ranked announcer, due to his first place on "Enthusiasm". 

A basic question to the success of the measurement operation is the 
uni-dimensionaiity of the "Quality of Announcing" variable. Are the six 
orderings independent manifestations of the same latent parameters ? Table 4 
summarizes the degree of fit within each ordering. Since ordering provides 
no information on, say, how difficult it is to "call the game", no difficulty 
calibrations are shown. 



Ability 


Sua of 


Measure 




Mea» "Square 


Anno'incer 


Order 


Rankings 


(Logits) S*4\ 


Fit Statistic 




1 


11 


0.98 


0.41 


1.51 


Vin Scully 


2 


14 


0.67 


0.35 


0.40 


Bob Costas 


3 


16 


O.SO 


0.32 


0.34 


Al Michaels 


4 


28 


-0.26 


0.28 


0.43 


Skip Caray 


5 


29 


-0.33 


0.29 


1.73 


Harry Caray 


6 


34 


-0.69 


0.33 


2.12 


Ralph Kiner 


7 


36 


-0.87 


0.37 


0.54 


Steve Zabriskie 


Mean: 


0.00 




1.01 





Table 3. Ability of Baseball Announcers 



Inforaation-weigh ed Outlier-sensitive Naae of 



Mean-Square fit Mean-Square fit Ordering Itea 



0.29 0.32 Calling the gaae 

0.35 0.39 Broadcasting ability 

0.38 0.41 Quality of anecdotes 

0.91 0.91 Working with analyst 

1.77 1.80 Knowledge of baseball 

2.29 2.22 Enthusiasa level 



Table 4. Fit statistics for iteas as aanifested in the rank orderings. 
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The items are generally %ctiwg in a coherent Banner in defining the 
variable. It My well be that "Calling the game" and "Broadcasting ability" 
are somewhat synonymous and not independent items, leading to a redundancy in 
the data. "Enthusiasm" displays the most misfit, and may be 
multi-dimensional. This is because, according to Polskin, it is easier to 
announce when you are doing it for the "home-team fans", as Harry Caray does. 

Table 5 shows those rankings which were the least expected. This Table is an 
aid to the diagnosis of aberrations in the measuring process, which Polskin 9 s 
analysis apparently lacked, since he failed to comment on the most unexpected 
ranking, that of Ralph Kiner on "Knowledge". 



Ordering Announcer Rank Expected Difference S.E. Z -Score 



Knowledge Ralph Kiner 


2 


5.7 


3.67 


1.23 


2.97 


Enthuaiasa Harry Caray 


1 


4.8 


3.83 


1.42 


2.71 


Working Bob Costa. 

• 


1 


2.3 


1.3? 


1.18 


1.13 


i 

Working Vin Scully 


3 


1.8 


-1.17 


0.99 


-1.18 


Working Harry Caray 


7 


4.8 


-2.17 


1.12 


-1,53 


Enthuaiasa Vin Scully 


4 


i.G 


-2.17 


0.99 


-2.19 


Mean for all ranks: 






0.0 




0.00 


Variance for all ranks: 










1.00 



Table 5. Most unexpected rankings of announcers arranged in Z-score order. 



IX. Conclusion. 

The application of the principles of fundamental measurement to rank ordered 
data has provided the means to convert entirely local rankings into 
generalisable measures of the latent abilities. Moreover, fit statistics for 
each object and for each ordering enable a determination of the success of 
the ranking process blh a measurement operation. 
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Figure 1. Announcers' Measures plotted against sua of rankings. 

N = estisated Measure of the announcer. 
H = M + standard error, L = M - standard error. 
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